1. Introduction
CircTarget is a database that offers a comprehensive compilation of circRNA-target RNA interactions. It integrates data from RIC-seq, KARR-seq, LIGR-seq, PARIS, and SPLASH experiments conducted across 13 human and mouse cell lines, as well as hippocampal tissues. For each circRNA-target pair, CircTarget provides detailed annotations, including: basic circRNA information and disease associations, interaction intensities, predicted RNA hybrid structures, and disease-associated variants overlapping the interaction sites. The current release of CircTarget curated 9,935 interactions supported by BSJ and 122,582 interactions supported by non-BSJ evidence.
2. Identification of highly circularized circRNAs
To identify highly circularized circRNAs, we analyzed total RNA-seq data from multiple sources. Public datasets for the human hippocampus (PRJNA322318) and nine cell lines—A549 (GSE219808), GM12878 (GSE78550), H1-hESC (GSE86189), HEK293T (GSE272077), HepG2 (GSE90229), HT29 (GSE78684), hNPC (PRJNA596331), IMR90 (GSE90257), and mESC (GSE203271 and GSE271659) —were obtained from the NCBI SRA database. Additionally, we generated in-house total RNA-seq data for K562, HeLa, hNPC-derived neurons, and mouse hippocampus. CircRNA characterization and quantification were performed using CIRIquant with default parameters. Only circRNAs with read count ≥5, circular-to-linear ratio (CLR) ≥ 0.85, and detected in at least two replicates were considered as highly circularized circRNAs.
3. Identification of circRNA target
The circRNA-target RNA interaction pairs were identified using the following pipeline (Figure 1).
Figure 1: The circRNA-target identification pipeline.
Adapters from sequencing data were first trimmed using Trimmomatic v.0.36. PCR duplicates were removed with in-house scripts, as we previously described. Poly(N) tails at the 3′ end were clipped using Cutadapt v4.1. After filtering, the paired reads were aligned to the human or mouse pre-rRNA sequences. The remaining reads were then mapped to the human (hg19) or mouse (mm10) genomes using HISAT2 v2-2.1.0 with default parameters to remove the normally mapped reads, including those spanning RNA splice junctions. The unmapped reads generated by HISAT2 were separately re-mapped to the reference genome using BWA v0.7.17 with the following parameters: bwa mem -k 12 -T 15. Given the co-expression of circRNAs and their cognate linear RNAs in cells, we first identified the circRNA-target RNA interactions by analyzing chimeric reads with one arm mapped to the circRNA BSJ site and the other mapped to the target RNA. These interactions were termed as BSJ-supporting interactions. To comprehensively map target RNAs for highly circularized circRNAs (CLR ≥ 0.85), we expanded our analysis to include chimeric reads with one arm aligned to any region of the circRNA transcript and the other to the target RNA. These interactions were classified as non-BSJ-supporting interactions. All aligned read arms were extracted and re-mapped to the genome using HISAT2 and Bowtie2, retaining only uniquely mapped arms (MAPQ > 20) for interaction assignment. To distinguish biologically significant interactions from background noise, we performed a Monte Carlo simulation for all BSJ and non-BSJ-supporting interactions, comparing observed interactions with simulated random interactions over 100,000 iterations. Interactions with a p-value < 0.05 were considered statistically significant. For highly circularized circRNAs, we applied additional stringent criteria: only interactions supported by at least two chimeric reads and with a p-value < 0.05 were classified as high-confidence interactions.
4. Prediction of circRNA-target RNA hybrid duplex
To predict potential hybrid duplexes formed between circRNAs and their target RNAs, we extracted RIC-seq-captured chimeric reads with one arm aligned to the circRNA and the other to the target RNA. We then used the RNAduplex tool from the ViennaRNA package to computationally predict RNA-RNA interactions, and the optimal hybrid duplex was provided.
5. Mapping of risk variants
Disease-associated variants were obtained from the GWAS Catalog, retaining 78,298 after manual curation. We also extracted 251,116 pathogenic or likely pathogenic variants from ClinVar. Additionally, ~80 million cancer-associated variants were sourced from the ICGC database, of which 5.1 million were retained based on quality (score >15) and mutant allele frequency (MAF >0.1). Finally, we intersected these risk variants with RIC-seq-derived circRNA–target RNA interaction regions using BEDTools.
6. Search page
The CircTarget web interface is easy to use. We provided three flexible search options to search the data.
1) circRNA-based search using identifiers from circBase or circAtlas, and host gene names.
2) Target-based search by gene symbol or Ensembl ID.
Figure 2: Schematic diagram showing how to search the database and read the results.
7. Results page
Section 1: CircRNA summary
This section displays detailed information for each circRNA, including alias, genomic coordinates, strand orientation, spliced length, and mature circRNA sequence—all directly extracted from circBase and circAtlas 3.0 databases. Each entry is cross-referenced with its corresponding record ID in these reference databases for easy verification and further exploration.
Section 2: Target information
This section provides the details of interacted targets for each circRNA. Each row shows an interaction query containing information of the target gene Ensembl ID, gene symbol, gene type, chimeric read count supporting this interaction, p-value, cell line/tissue, species, and detected method.
Section 3: Diseases associated with circRNA
This section shows the associations between circRNAs and diseases, curated from experimentally validated interactions in the circRNADisease, CircR2Disease, and circad databases.
Section 4: Gene model and interaction sites
The gene model presented here illustrates the comprehensive gene structure retrieved from the Gencode database (version 19 for the human genome and version 25 for the mouse genome), encompassing both coding and non-coding transcripts. The interaction regions between the target and circRNA are highlighted by red blocks at the top of the gene model.
Section 5: Detailed interaction information
This section details circRNA-target RNA interactions, with each entry representing a specific binding site. For BSJ-supported interactions, both 3’ and 5’ interaction regions flanking the back-splice junction are provided. In cases where the predicted hybrid duplex interacts with only one side of the BSJ, a single circRNA coordinate is shown. Additional annotations include: (1) the target RNA's binding region, (2) supporting chimeric read counts, and (3) the hybrid duplex's minimum free energy (MFE).
Section 6: Variants map to interactions
This section displays risk variants overlapping circRNA–target RNA interaction sites, curated from the GWAS Catalog, ClinVar, and ICGC databases.